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BACKGROUND OF THE INVENTION 
[Field of the Invention] 

This invention relates to an image processing 
device and an image processing method to convert or 
detect images by performing a predetermined 
processing of a two-dimensional image. 
[Description of the Related Art] 

Conventionally, in order to convert or detect 
images by processing a two-dimensional image, 
processing is performed for each pixel one by one, in 
which data of plural pixels surrounding a certain 
pixel are processed. 

Specifically, as shown in Fig. 9, for each of a 
matrix of pixels 101 which are large in number and 
compose a two-dimensional image, coefficients A 0 to A 8 
are respectively corresponded to and multiplied by 
data X 0 to X 8 , the pixel 101 and the eight neighboring 
pixels 102, surrounding the pixel 101 in a kernel 

block, and a sum of multiplied values, A 0 X 0 + AiXi + 

+A 8 X 8 , is obtained as a processing data of the pixel 
101. By shifting the kernel to each pixel, the 
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series of the above operations are performed for each 
of all necessary pixels. 

However, the above - described image processing 
method, in which computational processing is 
performed for each of all necessary pixels, results 
in extremely large volume of computations and 
extremely high computational burden and power 
consumption. More specifically, in each time the 
calculation processing is performed, a necessary 
pixel data has to be transferred from a storage to a 
processor, and all data on the plural neighboring 
pixels in the kernel have to be downloaded. In 
addition, when the kernel scans throughout the two- 
dimensional image, the same pixel is repeatedly 
accessed, which is a serious problem. 



SUMMARY OF THE INVENTION 
In order to solve the above -described problems, 
the present invention is achieved, aiming at 
providing an image processing device and an image 
processing method to allow image processing without 
loss using relatively simple combination of equipment, 
in an extremely short time, and with low power 
consumption . 

The image processing device according to the 
present Invention performs image processing, in which 
a two-dimensional image is composed of a group of 
pixel data which are a matrix of plural pixel data. 



and includes a plurality of storages and a calculator. 
The storage is structured in such a manner that the 
group of pixel data are divided into small blocks 
formed of the plural pixel data described above, and 
the plural small blocks further form a large block, 
in each of which each small block is defined and 
arranged by certain rules, each of such small blocks 
located according to the rules stores the pixel data 
independently, and by specifying an address assigned 
to each of such small blocks , the plural pixel data 
within the small block can be simultaneously read out. 
The calculator multiplies the pixel data, which is 
included in each of the small blocks composing the 
one large block and read out from the plural storages, 
by the coefficient matrix which is rearranged into a 
predetermined order . 

The image processing method according to the 
present invention is a method to perform image 
processing in which a two-dimensional image is formed 
of a group of pixel data which are a matrix of plural 
pixel data. In the method, the group of pixel data 
are divided into small blocks formed of the plural 
pixel data described above, and the plural small 
blocks further form a large block, in each of which 
each small block is defined and arranged by rules, 
each of such small block located according to the 
rules stores the pixel data independently in each 
storage, and by specifying an address assigned to 
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each of such small blocks, the plural pixel data 
within the small block can be simultaneously read out 
from the storage. Here, the pixel data, which are 
included in each of the small blocks composing the 
one large block and read out from the plural storages, 
are multiplied by the coefficient matrix rearranged 
into a predetermined order, and summed up. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram showing a schematic 
configuration of an image processing device according 
to a present embodiment; 

Fig. 2 is a schematic view showing a group of 
pixel data of the image processing device according 
to the present embodiment; 

Fig. 3 is a schematic view showing pixel data of 
a small block being stored in each memory cell; 

Fig. 4 is a schematic view showing a coefficient 
matrix controller in detail; 

Fig. 5 is a schematic view showing a state in 
detail in which each pixel data is multiplied by a 
coefficient matrix ; 

Fig. 6 is a schematic view showing an optimal 
relationship between a size of a kernel and 
small/large blocks ; 

Fig. 7 is a schematic view showing a 
configuration of an adding section in detail; 
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Fig. 8 is a schematic view showing a structure of 
the kernel in detail; and 

Fig. 9 is a schematic view showing a conventional 
image processing method. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
The preferred embodiments of the present 
invention will now be described with reference to the 
drawings . 

Fig. 1 is a block diagram showing a schematic 
configuration of an image processing device according 
to a preferred embodiment of the present invention. 
This image processing device includes: a plurality of 
cells, four individual memory cells in the case 
herein, which are SRAMs A , B, C, and D; a decoder 11 
to allow these memory cells to access pixel data; a 
coefficient matrix controller 12 to provide 
computational processing of the pixel data read out 
from the memory cell; an adding section 13 provided 
in the neighborhood of each memory cell to add up 
computed results of each pixel (the coefficient 
matrix controller 12 and the adding section 13 form a 
calculator); and an entire adding section 14 to 
further add up the results obtained by each of adding 
section 13 . 

As shown in Fig. 2, in the image processing 
device, a two-dimensional image is composed of a 
group of pixel data which are a matrix of plural 



pixel data, and these pixel data are divided as 
follows. First, the group of pixel data are divided 
into small blocks composed of plural pixel data, an 
example of which is that each small block is composed 
of 4. times. 4 pixels. Next, a plurality of small 
blocks, 2. times. 2 blocks for example, form a large 
block. Here, in each large block, each of small 
blocks is defined and arranged by certain rules. For 
example, each of four small blocks forming each large 
block are defined according to its location, and 
specified as A i:j , B ± j , C±-j, and ( I , j = 1 , 2 , 3... ) . Here, 

the number of the memory cells should be the same as, 
or more than, the number of the small blocks forming 
each large block. 

Subsequently, as shown in Fig. 3, all of the 
small blocks A ±j , Bij , Cij, and D Aj forming each of the 
large blocks are stored in the SRAMs A, B, C, and D, 
respectively. Here, each memory cell stores a pixel 
data row (in this case 16-data row) as one unit, and 
can simultaneously read out each of the stored data 
row by specifying one address thereof. 

It should be noted that equivalent bits of each 
of pixel data forming a small block in each memory 
cell are preferably arranged close together to be 
added up at the time of adding operation described 
later. In this way, the number of wires in the 
adding section 13 can be reduced. In addition, by 
further dividing each memory cell into groups therein, 
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the read-out speed can be further enhanced. 
Furthermore, upon the read-out of each pixel data, 
bit length of pixel data can be modified by masking 
certain bids thereof. 

On the other hand, as shown in Fig. 4, the 
coefficient controller 12 consists of a kernel 
register 21 which is a coefficient storage section to 
store a certain coefficient matrix, and a 2D shifter 
22 which is a converting section of coefficient 
matrix to rearrange the coefficient matrix into a 
predetermined order and correspond them to the above- 
described pixel data. 

The kernel register 21 includes a coefficient 
matrix which corresponds to a part of a group of 
pixel data of a two-dimensional image, forming a 
kernel CI. The coefficient matrix is formed of 
predetermined coefficients, which are three 
coefficients of -1, 0, and 1 in the case hereof, one 
example of which is shown in Fig. 4. 

It is noted that multiplication of -1 
(subtraction) uses twos complement, and calculation 
using twos complement is achieved by adding a bit- 
flipping of a pixel data having coefficient of -1, 
and adding the number of -Is to an appropriate bit 
position of the result of the addition. 

When image processing is performed for a large 
block formed of 2. times. 2 small blocks, each thereof 
further formed of 4. times. 4 pixels, where data of 



pixels in the large block are multiplied by 
coefficients, for example, pixel data rows of the 
small blocks A ± -, , B ± -j , dj, and D ±j are read out 
respectively from the SRAMs A, B, C, and D, and 
multiplied by coefficients forming kernel CI, as 
shown in Fig. 5. 

The above-described computation process is 
performed for each pixel by the kernel register 21 
which rearranges the coefficients into a 
predetermined order, that is to say, shifts the 
kernel CI by corresponding them to the pixel data 
within the large block. In other words, the 
addresses of the SRAMs A to D are not modified during 
the series of the computation proceedings, and each 
pixel data row read out from the SRAMs A to D (pixel 
data rows of 64 pixels altogether forming the large 
block) remains constant, whilst the coefficient 
matrix is converted. Accordingly, for example, the 
computation by shifting the kernel CI as shown in Fig. 
5 essentially leads to an result equivalent to one 
obtained by the computation by corresponding a kernel 
C2 to pixel data in the large block. Here, since 
multiplication is required only for the kernel CI, in 
the example shown, all the remaining part of the 
8. times. 8 map excluding the kernel CI may be set to 0 
( zero ) . 

As described above, upon the computational 
processing performed to each of all the necessary 

- 8 - 



pixels, the Image processing unit according to the 
present embodiment causes just one-time access to 
each pixel data in each large block, and computed 
results for all the necessary pixels can be obtained 
without modifying addresses of memory cells and 
simply by shifting the coefficient matrix. In this 
way, a high-speed processing of extremely high 
efficiency can be attained. 

In the following, an optimal relationship between 
the size of a kernel and small/large blocks will be 
described . 

As shown in Fig. 6 , where a small block is formed 
of m!xm 2 pixel data, a large block is formed of lixl 2 
small blocks, and a coefficient matrix of the kernel 
CI is formed of nixn 2 coefficients, the size of the 
kernel CI is determined so as to fulfill: 

ni<mi (li-l)+l 

and 

n 2 <m 2 (1 2 -1)+1. 
In the example of Fig. 5, mixm 2 is 4x4, and lixl 2 is 
2x2. The size of kernel CI, or nixn 2 , is 5x5 or 
smaller (in the example of Fig. 5, nixn 2 = 5x5). 
Incidentally, with a structure as shown in Fig. 5, 
regardless of where the kernel is located while being 
shifted within the large block, pertinent data can be 
simultaneously accessed without fail by the memory 
cell corresponding to each small block. 
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The results of multiplication thus obtained for 
each pixel data are added up for each memory cell in 
the adding section 13 provided in the neighborhood of 
each memory cell. Hence, each adding section 13 
obtaining each computed result enables transfer of 
just compressed partway results. Since data volume 
of a coefficient is less than that of a pixel in 
general, the entire traffic of data can be reduced 
such that pixel data are not transferred from memory 
cell, but coefficients are transferred to memory cell 
and only the result computed and compressed in the 
neighborhood of the memory cell is transferred back 
from the memory cell. 

For example, as shown in Fig. 5, where the kernel 
CI shifts for computation, a pixel data row of {X x , X 2 , 

X 16 } read out from the SRAM A is multiplied by a 
coefficient matrix { A ± , -j } ( I , j = 1 - 5 ) , and added up by a 
high-speed CSA (Carry Save Adder) not propagating 
carries shown in Fig. 7. Incidentally, coefficients 
of -1 and 1 are realized by bit flipping and "AND" 
respectively, and the multi-valued logic of three 
values is used for data transfer bus in order to 
transfer coefficients -1, 0, and 1. 

The result of the above computation is as 
follows : 

Computed result =0xX 1 + 0xX2+.M+Ai t ixXii+Ai (2 xX 1 2+... 

+ A 2 , ixX 15 + A 2(2 xX 16 
= 0xXn + lxX 12 - lxXi5 + 0xXi 6 
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=X 12 - Xi 5 

Subsequently, the computed results in each adding 
section 13 are added up in the entire adding section 
14 to obtain a sum, which is outputted as a result of 
the computation proceeding of a certain pixel data. 

It should be noted that whilst the coefficient 
matrix of the kernel CI is configured as shown in 
Figs. 4 and 5 in the present embodiment, such 
configurations are not restrictive and various types 
of coefficients may be applied. One example thereof 
is shown in Fig. 8, in which (a) is a 3x3 smoothing 
(mean) filter, (b) is a 5x5 smoothing (mean) filter, 
(c) is a 5x5 vertical edge extracting filter, and (d) 
is a Gaussian filter, in each of which the processing 
result is shown on the left side of the kernel. In 
the case of (d), the coefficients necessary for the 
Gaussian filter, which are more complex than those 
for other filters, are realized with a simple 
combination of three kernels as shown. 

The image processing device and image processing 
method of the present invention can thus process 
images without loss using relatively simple 
combination of equipment, in an extremely short time, 
and with low power consumption. 

The present embodiments are to be considered in 
all respects as illustrative and no restrictive, and 
all changes which come within the meaning and range 
of equivalency of the claims are therefore intended 
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to be embraced therein. The invention may be 
embodied in other specific forms without departing 
from the spirit or essential characteristics thereof. 
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